cuda : support non-contiguous i32 to i32 copy #17326

CISC · 2025-11-17T16:33:27Z

CISC · 2025-11-17T19:51:03Z

@jeffbolznv @0cc4m Looks like Vulkan could do with support too (but at least it didn't crash like CUDA did):
https://github.com/ggml-org/llama.cpp/actions/runs/19436977342/job/55610305113?pr=17326#step:3:7370

jeffbolznv · 2025-11-17T19:56:13Z

I thought Vulkan already supported this, but I'll check what's going on.

jeffbolznv · 2025-11-17T20:29:59Z

@jeffbolznv @0cc4m Looks like Vulkan could do with support too (but at least it didn't crash like CUDA did): https://github.com/ggml-org/llama.cpp/actions/runs/19436977342/job/55610305113?pr=17326#step:3:7370

Fixed in #17328

CISC · 2025-11-19T09:34:04Z

@slaren gentle ping

JohannesGaessler

While you're at it, can you rename the function to ggml_cpy_scalar and either remove the indentation or make it consistent?

CISC · 2025-11-21T22:03:16Z

While you're at it, can you rename the function to ggml_cpy_scalar and either remove the indentation or make it consistent?

There's a system to the indentation, they are indented to similar calls, but it still looks messy because everything is interleaved, I guess we might as well remove it.

Edit: Looks nicer and much more consistent on their own line.

CISC added 2 commits November 17, 2025 17:31

support non-contiguous i32 to i32 copy

8220b6d

add tests

2ba2b58

CISC requested a review from slaren as a code owner November 17, 2025 16:33

CISC requested a review from ggerganov November 17, 2025 16:34

DajanaV mentioned this pull request Nov 17, 2025

UPSTREAM PR #17326: cuda : support non-contiguous i32 to i32 copy auroralabs-loci/llama.cpp#236

Open

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 17, 2025

slaren requested a review from JohannesGaessler November 20, 2025 17:00

Merge branch 'master' into cisc/cuda-noncont-cpy-i32-to-i32

2af9bd1

JohannesGaessler approved these changes Nov 21, 2025

View reviewed changes

rename cpy_flt to cpy_scalar and reindent params

ea0ec13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda : support non-contiguous i32 to i32 copy #17326

cuda : support non-contiguous i32 to i32 copy #17326

CISC commented Nov 17, 2025

Uh oh!

CISC commented Nov 17, 2025

Uh oh!

jeffbolznv commented Nov 17, 2025

Uh oh!

jeffbolznv commented Nov 17, 2025

Uh oh!

CISC commented Nov 19, 2025

Uh oh!

JohannesGaessler left a comment •

edited

Loading

Uh oh!

CISC commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cuda : support non-contiguous i32 to i32 copy #17326

Are you sure you want to change the base?

cuda : support non-contiguous i32 to i32 copy #17326

Conversation

CISC commented Nov 17, 2025

Uh oh!

CISC commented Nov 17, 2025

Uh oh!

jeffbolznv commented Nov 17, 2025

Uh oh!

jeffbolznv commented Nov 17, 2025

Uh oh!

CISC commented Nov 19, 2025

Uh oh!

JohannesGaessler left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CISC commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JohannesGaessler left a comment •

edited

Loading

CISC commented Nov 21, 2025 •

edited

Loading